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Abstract 

In various applications, the effect of errors in gradient-based iterations is of particular importance 
when seeking saddle points of the Lagrangian function associated with constrained convex optimization 
problems. Of particular interest here are problems arising in power control applications, where net- 
work utility is maximized subject to minimum signal-to-interference-plus-noise ratio (SINR) constraints, 
maximum interference constraints, maximum received power constraints, or simultaneous minimum and 
maximum SINR constraints. Especially when the gradient iterations are executed in a disributed fashion, 
imperfect exchanges among the link nodes may result in erroneous gradient vectors. In order to assess and 
cope with such errors, two running averages (ergodic sequences) are formed from the iterates generated 
by the perturbed saddle point method, each with complementary strengths. Under the assumptions of 
problem convexity and error boundedness, bounds on the constraint violation and the suboptimality per 
iteration index are derived. The two types of running averages are tested on a spectrum sharing problem 
with minimum and maximum SINR constraints, as well as maximum interferece constraints. 



I. Introduction 

The gradient projection method has well-documented merits for finding saddle points of the Lagrangian 
function associated with general constrained convex optimization problems, and specifically those arising 
in power control applications. Of particular importance in this context is the effect of errors (perturbations) 
in the gradient vectors on the performance of the method. 

In these applications, a set of links comprising transmitter-receiver pairs communicate over the same 
frequency band, causing interference to each other. The resulting signal-to-interference-plus-noise ratio 
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(SINR) is the main indicator of each link's quality-of-service (QoS). The aim is to regulate the transmit- 
powers so that a sum of utility functions of the SINR across links is maximized, under constraints, which 
may include minimum QoS support in terms of SINR, maximum interference constraints, maximum 
receive-power constraints, or, a combination of application-dependent minimum and maximum QoS 
expressed in terms of SINR; see e.g., iTTTl — lf5Tl . 

The constrained nature of the problem naturally leads to solution methods based on the Lagrangian 
function. A dual method cannot be readily employed, because the coupling of power variables renders 
minimization of the Lagrangian function at each time slot difficult. Instead, gradients of the Lagrangian 
function with respect to the primal variables and Lagrange multipliers are used in order to find a saddle 
point of the Lagrangian; see e.g., (H Sec. 3.5], Q, JH and also CD, for applications to power control. 
In order for these gradients to be made available at the various nodes, some exchange of information is 
necessary through the reversed network or message passing — see e.g., |)9j Ch. 6], CO for the former, and 
iTTOl . B1 for the latter. As exchanges may be imperfect, they result in erroneous gradients. 

Errors in optimization methods have been the object of considerable research. Specifically, errors in the 
gradient method for finding saddle points have been adressed in the context of stochastic approximation 
ifTTl . lfl2ll . where a strictly convex objective, diminishing stepsizes, or linearly independent active con- 
straint gradients are typically assumed. Primal methods with erroneous gradient or subgradient vectors 
have also received attention. Bounded deterministic errors are considered in |[T3l . 11141 . Random errors 
are studied in |[T5l ; and in lfT6l . using running averages (ergodic sequences). Various deterministic or 
random error models are considered in IfTTl Ch. 4 and Sec. 5.5]. 

In the context of power control, network utility maximization in the absence of power-coupling 
constraints has been pursued in |[T8l using the stochastic approximation framework, and diminishing 
stepsizes. Unconstrained optimization with logarithmic utilities and errors in the knowledge of link gains 
has been studied in 1101 . Even in the absense of errors, ergodic sequences are formed in order to obtain 
constrained Nash equilbria in power control games |fl9l . 

This paper deals with the assessment and mitigation of errors in the gradient method for saddle points, 
in contrast to ifOl - lfTTIl . which deal with primal methods. Relative to stochasic approximation methods 
IfTTl . lfl2H . the errors here are modeled differently — namely as bounded but otherwise arbitrary — and 
explicit per iteration bounds are developed on the induced constraint violation and suboptimality. Due to 
their general setting, the results are valuable within the theme of error analysis in optimization theory. 

The specific contributions and organization of the paper are as follows. Section [jJJ presents two 
collections of power control problems: (a) typical problems where network utility is maximized subject 
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to minimum QoS constraints, maximum interference, or, receive -power constraints; and (b) contemporary 
spectrum sharing problems. In the latter, primary users (license holders) and secondary users simulta- 
neously access a licensed band, or all users are allowed to use an unlicensed band [20]. Simultaneous 
minimum and maximum QoS constraints in terms of SINR must be imposed in both cases; maximum 
interference constraints are also incorporated here, extending related results of (H. 

Section [TTT] begins with description of the gradient method for saddle points in order to obtain a unifying 
solver for the various power control formulations. Message passing or the reversed network approach 
are employed to distribute the gradient iterations. Then, focus turns to error-resilient gradient-based 
iterations and their analysis. The impact of errors is mitigated and analyzed through running averages 
of the iterates obtained by the perturbed saddle point method. Two types of averages are considered, 
namely: (a) one with equal weights for all iterates; and (b) one for which past iterates are weighted 
via exponentially decaying weights. The analysis is applicable when constant stepsize is used — which 
is desirable in resource allocation algorithms — and under persistent (non-vanishing) errors. Each type of 
averaging has its own merits. Explicit bounds are derived per iteration on the constraint violation and 
the suboptimality induced by errors. Related results but for the first type of averaging and without errors 
can be found in (3). 

Finally, numerical tests are presented in Section |IVJ and conclusions with pointers to future directions 
in Section |\H 

II. Unifying Power Control Formulations 

A. Typical Power Control Problems 

Consider the power control problem for a single-channel (i.e., single-carrier) network in which users 
share the same frequency band, e.g., as in CDMA. Assuming a peer-to-peer operating setup, there is 
a set of M. := {1,...,M} links, where each link i € M. comprises a dedicated transmitter (Txj) 
wishing to communicate with a corresponding receiver (Rxj). The terms pair, user, and link will be used 
interchangeably. Let hij denote the (power) path gain from Txj to Rx^, that is assumed invariant. The 
path gain models the relationship between the transmitted and received power, and captures any 
signal processing operation taking place at the transmitter or the receiver. Also, let rii denote the noise 
power at Rx^; pi the transmit-power of Tx^; and p™ ax the maximum power budget Txj can afford, i.e., 
< Pi < pf 1 ^. The received SINR 7$ at Rxj is a function of the powers p := \p\, . . . ,pm] t given by 



4 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (SUBMITTED) 



Model (fl} is general, and can accommodate a number of (de-) modulation schemes [9, Ch. 4]. For future 
use, define vectors p max — [p™ ax , . . . ,p™ f ax ] T , 7 := [71, . . . , 7m] T ; and the matrix A = [ay] with 
<Hj ■= hji/hu if i / j, and o« := 0. 

The utility associated with each link i G M. will be described by a generic function u^i). The goal 
is to maximize the sum of all link utilities subject to QoS, interference, or receive -power constraints. For 
all optimization problems described next, the following two operating conditions are adopted. 

CI. Utilities ^(7,), i = 1, . . . , M, are chosen so that: (a) they are strictly increasing and twice continu- 
ously differentiable; and (b) —Jiu" (%) /w^Ti) > 1 for 7^ > (' denotes differentiation). 

C2. The noise power satisfies nj > for all i; and the gain matrix A is irreducible; see e.g., ||9] Def. A.27]. 

Condition CQ]is standard in the power control literature to guarantee that Ui(^i) is concave in ln7j; 
see e.g., (9] Ch. 5], ET1 . It also effects the fairness condition lim 7 ._^ + «i(7i) = —00, which guarantees 
that non-zero power is allocated to all users. Examples of utilities satisfying Cfl] are Ui{%) = ln7j, 
= Ji/ct with a < 0, and Ui{^i) = m[m(l + 7$)]. Furthermore, the irreducibility of A in Cj2] is 
also a standard assumption in power control problems (T). It means the users cannot be divided in two 
disjoint groups without at least one user in one group being interfered by a user in the other group. 

Power control must also account for the following constraints; see e.g., iTTTl — lf3Tl . 

1) Minimum QoS support: To ensure minimum QoS levels, the QoS per link % is generically described 
here by a function Ut(7i), which can e.g., represent rate when Vi{^i) = ln(l + 7$). If Viipfj) is 
chosen monotonic, then minimum constraints on Uj map one-to-one to SINR bounds, 7, > 7™ m . 

2) Maximum interference constraints: The interference -plus-noise (IpN) term rtj + hftiPk at Rx^ 
[cf. ([T])] is constrained not to exceed qf iax . In this way for instance, weak links are protected from 
excessive interference. 

3) Maximum receive-power constraints: The total received power hapi + ni + Y^k^i hkiPk at Rxj is 
constrained not to exceed sf 13 *. Wireles mess networks is an application, whereby different wireless 
systems sharing the same bandwidth may each connect to a different node in the mesh network, 
and regulating the received power helps accomodating more such systems. 

Not all previously mentioned types of constraints are necessarily present simultaneously. However, 
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retaining them all leads to the following unifying form of the power control problem: 



max 

0<p<p n 




(2a) 



subj. to 7j mm < 7i Vi £ M 



(2b) 



(2c) 



h ii pi + n t + ^2h ki p k <sf iax VieM. 



(2d) 



k^i 



If the trasformation pi = e Vl is applied, then (f2]) becomes a convex optimization problem in y := 
[yi, . . . , ?/A/] T - Specifically, the objective is a concave function of y; see e.g., ||9l Ch. 6]. The power 
constraints become y G y, where y := {y G M A/ |yi < lnp™ ax , z = 1, . . . , M}. Constraints (T2bl)-(l2dl) 
take the form of sums of exponentials, which are convex; see e.g., lf22l Ch. 3]. 

B. The Spectrum Sharing Paradigm 

The spectrum sharing paradigm has been put forward by the Federal Communications Commission, 
and allows a frequency to be utilized by more users than the ones licensed f20l . In this paradigm, there 
are two models of interest here. 

• Flexible primary model: License holders, called primary users (PUs), allow secondary users (SUs) 
to access the spectrum. The SUs pay a fee to the PUs, and receive QoS in a range determined by this 
fee. This is oftentimes called a secondary market. The PUs should also be guaranteed a minimum 
QoS or maximum interference level. 

• Open sharing model: All users are considered primary, and cooperate in order to achieve efficient 
resource management. To this end, they voluntarily set lower and upper bounds on the received QoS. 

It is apparent that in both models, the resource allocation task must account for minimum and maximum 
bounds on the QoS, and possibly maximum interference constraints. Formulations are presented next for 
single-channel and multi-channel settings. 

1 ) Single-Channel Networks: Recalling the notation of Subsection III-Al upper and lower bounds on 
QoS expressed in terms of Vi("fi) map one-to-one to SINR bounds; i.e., Uj(7j) G [^i(7| nm ), ^(t™^)] ^ 
7i G [if 1111 , 7 4 max ]- Moreover, an upper bound q 4 max on the interference rij + hkiPk inflicted to link 
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i may be considered. Putting things together, the associated power control problem amounts to 



max 

0<p<p max 


M 

£)ui(7i) 

i=l 




subj. to 


7, min < 7, < ir x 


Vi G At 






/r x ViGAf. 



(3a) 

(3b) 
(3c) 



It is now worth checking how the general formulation © can be useful in describing the design 
objectives in the spectrum sharing models previously described. 

Application 1 (Flexible primary with minimum QoS guarantees). The set of users is divided into a set 
of PUs M p , and a set of SUs M s , i.e., M = M p U M s . The PUs set bounds on the received QoS of 
the SUs, based on the fee that the latter pay. Moreover, minimum SINR guarantees are also included for 
the PUs. Thus, constraint d3bl is specialized to 7™ m < ji for all i G M p , and 7™ m < ji < 7? nax for all 
i G M. s . Constraint d3cl ) is absent. The SUs may or may not have minimum SINR guarantees, depending 
on the agreement between them and the PUs. 

Application 2 (Flexible primary with interference protection). The PU-SU paradigm is adopted as in 
Application [Q but now, the PUs are protected against excessive interference. Constraint ( f3cT ) takes the 
form ni + Yjk^i hkiPk < qf 18 * for all i G M p , while {[3bl> becomes 7 4 min < 7, < 7? iax for all i G M s . 

Application 3 (Open sharing model). In the open sharing model, all users are peers, and cooperate 
in order to achieve efficient allocation of the network resources. Specifically, each user can set bounds 
matching its own requirements on Qos, leading to d3bl . Constraint ( f3cT ) is absent. 

Note that the transformation pi = e y ' — which has been the "workhorse" for efficient power control, as 
seen in the previous subsection — does not readily convexify ([3]). The reason is that the second constraint 
in (l3bl) becomes the superlevel set of the sum-exp function. In order to facilitate the solution of 
through convex optimization, the following operating condition is adopted, on top of CQ] and CO 

C3. If every user has a maximum SINR constraint, there is no power vector p with < p < p max and 
n i + Sfc^i h-kipk < qf 1 ^ for all i G M such that the resulting SINRs 7, satisfy 7, = 7™ ax for all i G M. 

The condition is satisfied automatically for flexible primary models, because primary users in this case 
do not set upper bounds on the received QoS. For the open sharing model, the condition can be easily 
checked using the standard power control algorithm of |[23l . 
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Having clarified the operating conditions, we proceed to relax the nonconvex (0. To this end, let 
qi denote an auxiliary variable associated with link i, upper-bounding the interference -plus-noise (IpN) 
term + Ylik^i ^kiPk- Collecting all variables in q := [qi, . . . , qui] 7 * and constraints qf 13 * in q max := 
[gf iax , . . . , q^f x ] T , consider the following relaxed version of ©: 

M 

max y^uiihupiq^ 1 ) (4a) 

0<p<p ml,x , q<<7 max z — ' 
i=l 

subj. to 7 r in < h iiPl qr l < 7j max VieM (4b) 
qi>n,i + /ifciPfe Vi £ M. (4c) 

It is worth noting that the upper bound q™ ax is now placed on the variable qi. 

Problem (O can be transformed into an equivalent convex optimization problem. To this end, apply 
the one-to-one change of variables pi = e Vi and qi = e Zi , and define y := [y\, . . . ,hm] T and z := 
[z\, . . . , zm] T ■ Under CQ] problem (H} is equivalent to a convex optimization problem in (y,z). The 
convex set constraint for y is y as in the formulations of Subsection III- A I while for z, it is Z = {z € 
R M \ Zi < lnqf^Vi e M}. 

The ensuing lemma proved in Appendix |A] asserts that the optimal solution of (0 satisfies (l4cl) with 
equality, and therefore, it is the solution of (f3]). 

Lemma 1. If (|3) is feasible and C\7}-C\3\hold, the optimal solution p* ,q* of (0]), satisfies 

1* = rii + h ki p* k Vi£M. (5) 

k^i 

The convexity and optimality of the relaxed problem (01) will be leveraged in Section [Hi] The previous 
ideas are generalized to multi-channel networks in what follows. 

2) Multi-Channel Networks: Users here may transmit over an orthogonal set of frequency bands 
T := {1, . . . ,F}, also referred to as channels, subcarriers, or tones. The power of Txj on channel / is 
Pij, the noise power at Rxj on channel / is mj, and the (power) path gain from Txj to Rxj on channel 
/ is hijj. Moreover, each user adheres to a spectral mask pij < p™j x , and maximum power budget 
YlfPiJ — P? ax - Vector pi := [p^x, . . . ,Pi,p] T contains the power loadings for user i. Then, each user's 
power must lie in 

Vi := jp, > PlJ < V/ € T; £ Kf < P?" j ■ (6) 

The received SINR at Rxj on channel / is jij := hiijPi,f/{ n ij + 2^2k^i^kijPkj)- Similar to the 
single-channel case, Aj is the gain matrix for channel /. 
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The aim is to formulate the power control problem for a multi-channel network incorporating diverse 
QoS constraints as well as interference constraints. To this end, utility functions of the SINR per user 
and channel are adopted, namely, 1^ /(7* /), U^j (7i,/), and Vi,/(7i,/). QoS is collected for each user by 
summing each type of utility function across channels. The first one is used to give the objective function 
to be maximized, the second one to impose a lower bound f7™ in , and the third one to set an upper bound 
V^max. Regarding the interference constraints, there are two ways to generalize (f3cT >: imposing either (a) 
individual bounds g™ ax per user across channels, or, (b) individual bounds qf 1 ^ per user and channel. 
The choice between the two possibly depends on the bandwidth of each channel; case (a) may be more 
suitable for smaller channel bandwidths, and (b) for larger channel bandwidths. 

All in all, the optimization problem generalizing (O to multi-channel networks is 



M F 






max > > Uj t (ji f) 






i=l f=l 






F 


F 




subj. to J2Ui,f(li,f) >Ur n and 


I>,/(7,/) 


< ^ max , i G M 


/=l 


/=i 










Hi >f + Yl h ki,fPkJ 


< gf iax or n 


U + Yl h ^JPkj < <T' f€J r ,ieM. 


f=l \ kjti J 




k^i 



(7a) 
(7b) 
(7c) 



Clearly, ([7]) can be specialized appropriately to obtain multi-channel counterparts of Applications [T]-(3] 

Similar to the single-channel case, a relaxation of ([7]) is formulated. Let := [q^i, ■ ■ ■ , %,f] T be the 
local IpN vector. Moreover, define the following constraint sets for corresponding to the two versions 
of dTc]): 



Q l ■= < Qi > o 



%f < ) or Qi ■= {* > I*,/ < Cf V/ G ^} • (8) 



To form the relaxation of (O, the SINR jij is replaced by the ratio hajpij /q%j, and the local 
IpN constraint qij > rtjj + Ylj^i hji,fPj,f * s introduced. The transformation is now pij = e VzJ and 
qi,f = e 2i / for all i and /, and define y L := [y^x, . . . , yi,F] T , %i '■= [ z i,i, • • • > ^,f] T > and y and z as the 
vectors collecting yi and for all % € M, respectively. The transformed constraint set for the powers yi 
comprises the inequalities y t j < lnp™y x and YlfeF eV,J — P™ ax [cf. ©], while y := x • • • x ^Af • The 
transformed constraint set Z\ for the variables Z{ takes the form of Y^f=x eZzJ — qf 1 ^ or z^j < lnq™^ x 
[cf. dU)]; define also 2 := Zi x . . . x 

The following operating conditions, corresponding to CUJ-CfJ] are adopted. 
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CI'. Utilities Uij(jij) and t/i,/(7t,/). £ = 1, . . . , M, satisfy CQ] Utilities Vij^ij), i = 1, . . . , M, are 
chosen so that: (a) they are strictly increasing and twice continuously differentiable; and (b) they are 
concave and satisfy -7i,/^'/(7i,/)A^/(7i,/) < 1 for li,f > 0. 

C2'. It holds that riij > for all i and /, and gain matrix Af is irreducible for all /. 

C3'. If every user has a maximum utility constraint, there are no pi, % with pi € V%, q% € Qj such that 
the maximum QoS constraint holds with equality for all i [cf. (I7bl with replaced by hajpij/qij]. 

It is easy to verify that the transformed relaxed problem is a convex optimization problem under CTU- 
Moreover, the relaxation incurs no loss of optimality under CjTJ— CJ3J-. analogously to Lemma Q] 

III. Unifying Algorithm and Error Analysis 

All power control problems described in the previous section share the characteristic that they include 
QoS, interference, or received power constraints, which couple the power variables. A dual method for 
finding the solution is not readily applicable, because this coupling renders maximizing the Lagrangian 
difficult. For this reason, the optimal power allocation is solved by a gradient method, which yields a 
saddle point of the associated Lagrangian function. 

In order to facilitate the development, the power control problems of Section JI] can be put in the 
following generic form[j] 

min f(x) (9a) 
subj. to g(x) < 0. (9b) 

The optimization variable x is y for problem Q; and X = y. In the case of (0 or ([7]), x collects 
the respective y and z, and similarly, X is y x Z. The association of functions f(x) : H N — > R and 
g(x) : H N — > H K with the objective and the constraint functions in the power control problems with 
appropriate choice of dimensions N and K, is evident. 

Let C denote a vector of Lagrange multipliers corresponding to constraints d9bl ). The Lagrangian 
function of problem (O is 

L(x,c) = f(x) + C T g(x). (10) 

The following assumption is adopted for problem (©. 

'All references to ([2} will in fact be to its convex equivalent after the transformation pt = e Vi . All references to I0 and Q 
will be to their relaxed convex versions with variables (y,z). Moreover, it will be clear when the symbol / will be used to 
denote a channel in the multi-channel power control problem, or, the objective function in the generic optimization problem. 
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Assumption 1. Functions f(x) and g(x) are convex and continuous on X, and the set X is convex, 
closed, and bounded. The set of optimal solutions X* is nonempty, closed and bounded. Moreover, Slater 
constraint qualification holds, i.e., there is an x £ X so that g(x) < 0. 

Assumption [JJ is clearly satisfied by the power control problems of Section ITT] In particular, convexity 
is ensured by condition CQ] The optimal powers are non-zero due to Cfl] and remain constrained by 
maximum power budgets, hence entries of the optimal y* are bounded. In fact, straightforward application 
of Weierstrass' theorem ll24l Prop. 2.1.1] implies that the optimal sets are closed and bounded in all 
power control problems of Section ITT] Slater's constraint qualification was not explicitly mentioned in the 
previous section, but it is a natural assumption for the power control problems. The following observation 
is an immediate consequence of Assumption [JJ see e.g., ll24l Ch. 6] for the related theory. (||.|| denotes 
the Euclidean norm, and H + the nonnegative reals). 

Observation. The set of optimal primal solutions X* and the set of optimal dual solutions (optimal 
Lagrange multipliers) V* are convex, as solution sets of convex optimization problems. Set V* is also 
nonempty, closed, and bounded. In particular, there is constant > so that ||£*|| < B£ for any dual 
optimal £*. Furthermore, for any convex set V with P'cPC R+ , the pairs (a?*, £*) with any x* € X* 
and € V* are exactly the saddle points of the Lagrangian function (TTOb over X x V. 

Based on this observation, solving I© amounts to finding a saddle point of the associated Lagrangian 
function over X x T>. Let P% and denote the projection on X and V, respectively; and V x L(x, £) and 
V^L(x, £) the gradient of the Lagrangian with respect to x and £, respectively. Consider the following 
gradient projection algorithm, indexed by t = 0, 1, 2, . . ., with a > a constant stepsize: 



A desirable feature of the algorithm (fTTb is the constant stepsize, which enables adaptation in resource 
allocation algorithms where the parameters may vary slowly. Another feature is the projection, which 
here ensures that the transmit powers remain within budget in each and every iteration. 

In order to perform the updates (fTTT ) in a distributed fashion suitable for power control, the partial 
derivatives of the Lagrangian must become available at the nodes they are needed through exchange 
of information, which may entail errors. The ensuing Subsection MI- A I outlines methods of exchanging 
information among nodes, and asserts the convergence of (fTTb in the error-free case. The latter is relevant 
if sufficiently strong error control codes are used, to ensure error-free exchanges. Then, Subsection IIII-BI 



x{t + 1) = P x [x(t) - aV x L{x(t),£{t))] 



(11a) 



C(t + 1) = Pv[C(t) + aV c L(x(t)X(t))]- 



(lib) 
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pursues error-resilient iterations and their performance when exchanges are imperfect. Subsection IIII-CI 
explains how to select certain performance-critical parameters involved in the context of power control. 
The material in Subsection MI- A I unifies ideas from [1] and (U, and extends them also to include the 
more general formulations of Subsection III-B I The results of Subsection IIII-B I are of interest also outside 
the context of power control, as they are developed for a general optimization algorithm. 

A. Error-free exchanges 

Iterations ([TTI ) do not converge in general, even under convexity assumptions. In the context of power 
control, the objective (|2al ) is strictly convex under CQ] and C|2] [T]. For problem (@]), strict convexity 
of the objective function does not hold in general. (For instance, the utility Ui(ji) = Iwji under the 
transformation pi = e y \ = e Zi becomes Yli=i ( m ^Hi + Vi ~ z i)-) However, the following weaker 
property is shown to hold in Appendix lAl 

Lemma 2. Under CJ7]-CJ2] (or C(7J-C0]/or the multi-channel problems) and Slater's constraint qualifica- 
tion, the following property holds for the Lagrangian functions of (O and ©/or any £* G V*: 

L(x*,C) < L(x, O Va: e X, x ^ x*. (12) 

This property is called stability of the saddle points with respect to a; in 0. Clearly, (fl2l holds 
automatically if f(x) is strictly convex. 

Either strict convexity of f(x) or (fT2l alone ensure convergence of (fTTb by immediate application 
of Q; see also ID and for related results in power control. With Q* denoting the set of optimal 
primal solutions-Lagrange multipliers X* x V*, and dist(u;, Q) the distance of a point u := (a;, C,) from 
a closed convex set Vt, the convergence claim is summarized nextj^l 

Lemma 3. Suppose CjTJ-CfJ] (or CjTJ-CfJ] for the multi-channel problems) and Slater's constraint quali- 
fication hold, implying Assumption [7] and (II 21 ). For every e and 5 with < e < 5, there exist positive 
ao(£,5) and to(e,5) such that for any stepsize < a < ao(e,5), and any initial point u;(0) £ X x T> 
with dist(a?(0), Q*) < 5, the iterates u(t) in (fTTb satisfy dist(u;(t), Q*) < e for all t > to(s,S)/a. This 
further implies that dist(a:(t), X*) < e for all t > to(e, 5)/a. 

This lemma asserts that the power allocation will remain as close as desired to the optimal one after a 
number of iterations using a sufficiently small stepsize — the stepsize and the number of iterations depend 

2 The distance from a closed convex set is defined as dist(u;, fi) := inf^n — iu|| [24, p. 88]. 
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on the desired proximity of x(t) to X* . 

Next, the distributed implementation of (fTTT) is considered. The updates of yi, Zi, and the associated 
Lagrange multipliers pertaining to link i are performed at Txj. For this to be possible, Txj needs to know 
the partial derivatives of the Lagrangian function with respect to yi and z; t . It is easy to verify that all 
these derivatives, except dL(u) / 'dyi, depend on the channel ha and the current SINR value, namely, 
7i(i) = hne Vi W I (rtj + Ylj^i hjie Vi ^'). The latter can be fed back from Rxj — where it is measured — to 
the corresponding Txj. 

On the other hand, dL(u)/dyi for each of the formulations of Section ITT1 involves a sum that depends 
on quantities non-local to link i. This term has the form e Vl ^ Ylj^i h%jij{t), where £j takes the following 
values. 

1) For the case (O, let fij, Xj, and uj, j = 1, . . . ,M, be Lagrange multipliers corresponding to the 
convex form of constraints (l2bl)-d2dl). respectively. Then, 



V v ' V v ' V v ' 

A EC 

(13) 

where the terms A, B, C correspond to the parts of the Lagrangian for constraints (f2bT)— (l2dT>. 
respectively. 
2) For the case (O, 

^(t) = H (t)e~ z ^ (14) 

where fj,j, j = 1,...,M, are the Lagrange multipliers corresponding to the convex form of 
constraints (l4cl ). that is, e~ Zj (rij + Yl<k^j hkj£ Vk ) — 1 < 0. The multi-channel case |7]) is analogous, 
where now the non-local part has the form Y^,f e ^ Y^j^i^ l {jl J 'j e ~ Zj ' anc ^ /■*/> 3 = 1,---,M, 
f = 1, . . . , F, correspond to the convex form of the local IpN constraints. 

Note that £j(t) depends on quantities which are known to Txj, j ^ i. There are two ways to make 
Ylj^i available to Tx^: (a) message passing, and (b) the reversed network operation. 

In the message passing approach, Tx, acquires the cross-channels hij to all other receivers through 
training and/or feedback. At each time slot, Tx^ must then broadcast the variable £j(t) to all other 
transmitters. 

In the reversed network approach, no exchange of information among different links is required, but 
the links are assumed reciprocal; that is, the channel from Txj to Rxj is identical to the channel from 
Rxj to Txj. The transmitters become receivers, and vice-versa. In order to use the reversed network, 
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rewrite Ylj^i hijCjify as Ylj=t hij£,j(t) — hu^i(t). The sum YljLi hij^j(t) is then the received power at 
Txj (which is a receiver for the reversed network), if all receivers (which are transmitters for the reversed 
network) transmit symbols with power £j(t). In order for the receivers of the links to know the quantities 
£j(t), they need to perform updates of all local quantities, starting from the same initialization as their 
corresponding transmitter. 

For the multi-channel case, the previously described operations are performed on a per-channel basis. 
In any of the two distributed implementations, the value of dL(u)/dyi [or dL(u)/dy{] may contain 
errors, induced by perturbed exchanges among links. This motivates the study of (1111 ) in a more general 
setting in the ensuing subsection, where the gradients of the Lagrangian are perturbed by errors. 

B. Imperfect Exchanges 

Suppose V x L(x(t), C(t)) is perturbed by error r(t) at iteration t, while \/^L(x(t), C(t)) is perturbed 
by error e(t). Then, iterations (fTTT l become 

x(t + 1) = P x [x(t) - a(V x L(x(t), C(t)) + r(t))] (15a) 
C(t + 1) = Pv[((t) + a(V c L(x(t), C(t)) + e(t))}. (15b) 

Iterations ( TT3T ) and their performance pursued afterwards have the following desirable attributes: 
al) Persistent (i.e., nonvanishing) errors are accounted for. 
a2) Constant stepsize is used, which enables adaptability. 

a3) The error-induced constraint violation and suboptimality can be bounded per iteration. 
Key to establishing al)-a3) is usage of suitable averages of the iterates {x(t)}. These can help cope 
with nonvanishing errors and constant stepsizes, both of which drive the terms ar(t) and ae(t) in (fT5l) 
to stay away from zero. Consider the following running average |8] 

1 <_1 

x(t) :=- * = 1.2,... (16) 

i=0 

and also its exponentially weighted counterpart with the so-called forgetting factor (3 £ (0, 1) 

xg(t) := ^ t= ? i — = t= ? i — , t = l,2,... (17) 

inspired by the exponentially decaying window approaches used in adaptive signal processing |25l . 
Clearly, averaging as in (fTTl) weighs more the recent iterates. Eq. (fT6l ) may be viewed as obtained from (fTTl) 
in the limit (3 — > 1. The sequences {x(t)} and {xp(t)} which are formed from the iterates {x(t)} are often 
referred to as ergodic sequences. Being convex combinations of the iterates {x(0), x(l), . . . ,x(t — 1)}, 
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the running averages x(t) and xp(t) belong to the set X for all t > 1. It is important to remark that both 
running averages can be efficiently computed in a recursive fashion, e.g., x(t) can be computed using 
x(t — 1) and x(t); and similarly for xp(t). 

The main result is that the sequences {x(t)} and {xs(t)} converge to some neighborhood of the 
optimal solution of (O. It is further of interest to estimate these neighborhoods — and specifically assess 
the constraint violation and suboptimality — and also to study how the constraint violation and the objective 
value evolve across iterations. It will be seen that the two averaging schemes have complementary merits. 
Specifically, {x(t)} in ( fT6l ) may converge to a smaller neighborhood than ( fTTT ). On the other hand, {xp(t)} 
in (fTTT ) reduces the constraint violation much faster than (fl6l) . which is desirable in practice. 

In order to proceed with the development, the ensuing assumptions are adopted in addition to Assump- 
tion PQ 

Assumption 2. The iterates {x(t)}^l and {£(i)}gi generated by ( fT51) are bounded. Specifically, there 
exists a constant > B^ so that 

||C(t)||<Bc> * = 0,1,2,... (18) 
Under (fT8l) . the gradient vectors are also bounded, i.e., there is a constant Bl > so that 

\\V x L(x(t), C(i))|| < B L , \\V c L(x{t), C(t))\\ < Bl, t = 0, 1, 2, . . . (19) 
Assumption 3. The error sequences are bounded, i.e., there are constants r > and e > so that 

||r(t)||<r, ||e(t)||<e, t = 0,1,2,... (20) 

Assumption |2] will automatically hold if the sets X and T> are compact, due to the projection on those 
[cf. (1151)1. In power control problems, these sets can be selected to be compact, because the sets of optimal 
primal variables and optimal Lagrange multipliers are bounded, as explained earlier. In particular, for a 
constant g > 0, the following set can be used as a compact V for the projection: 

©ball := {C > 0| HCil < B* c + g} (21) 

where B^ is introduced in Assumption Q] It will be useful to express B^ using the vector x satisfying 
Slater's condition; see also |8]. In particular, let d denote a lower bound on the optimal dual value of (O, 
and gk(x), k = 1, . . . , K the entries of g(x). Then, a value for Bt is 
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For the general analysis presented here, it is not necessary to assume that the sets X and V are 
compact. In particular, the analysis holds if V is chosen as any convex set satisfying P*cPC , or, 
if V = Pbaii m particular. In the former case, we can consider under Assumption |2] that the iterates {C(i)} 
will remain in the bounded set given by Pbaii with g = — B^ (with a slight abuse of the notation 
g). It is also worth stressing that when X and V are not compact, or when stronger assumptions on the 
convexity-concavity of the Lagrangian function do not hold, it is natural to assume bounded iterates or 
gradients; see e.g., HI, HI, lfl4Tl . 

Assumptions Q] and [2] imply that there is a constant Bd so that 

\\x(t) - x*\\ < B d , Vt>0,x*eX*. (23) 

Assumption [3] does not hold when e.g., the errors are randomly drawn from a distribution with infinite 
support. Such a case may be better handled in the context of stochastic approximations iTTTTl . |[T2l . Note 
though that strict convexity of the objective function or diminishing stepsizes or linearly independent 
gradients of active constraints are typically needed for convergence in such a framework. Without invoking 
such assumptions, Assumption [3] will allow characterization of the constraint violation, and suboptimality 
assessment as a function of the iteration index. Furthermore, note that Assumption [3] holds if the errors are 
random deviates from a distribution with bounded support (e.g., uniform) and have arbitrary correlation 
across time (iteration number), or, across the error vector entries. In this case, the results established here 
effectively hold with probability 1. 

Assumptions [T|-[3] hold throughout, unless otherwise stated. Let /* denote the optimal value of (O, 
and x* G X* an arbitrary primal optimal solution. The first result provides bounds on the norm of 
the constraint violation || [g(x(t))] + 1|, and the objective value f(x(t)) at each iteration index t using 
initialization variables and the quantities defined in ([T8T)-(l23l ([.] + denotes projection onto R+). All 
proofs of the results in this subsection can be found in Appendix iBl 

Proposition 1. Under Assumptions \J}^3\ the sequence {x(t)} satisfies for t>l 

(j) ||t(0)f+2||C(0)l|B< + ^ + |.(0) + 2B (e + Bd r 

ztag ztag g 

+ a(B L + e y- + a(B L + rf (24) 

fm) < r + + mk + Bdr + B(e + «^±± + (25) 

2at Zat Z Z 

(iii) f(x(t))>f*-Bl\\[g(x(t))] + \\. (26) 
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In part (i), there are terms which decrease with t, and terms which are constant. The constant terms 
capture the final error level in the constraint violation, while the terms decreasing with t quantify the 
rate of decrease. Part (ii) provides an upper bound on the difference f(x(t)) — /*, and similar comments 
with part (i) hold regarding the rate of decrease of this bound. In part (hi), the quantity || [g(x(t))} + | can 
be substituted from part (i) to obtain a lower bound for f(x(t)) — /*, which also decreases with t in the 
previously described fashion. It is also worth stressing that apart from the dependence on the iteration 
index, parts (i)-(iii) provide bounds where the impact of the errors (e, r) is explicitly accounted for. It 
follows further from part (iii) that f(x(t)) may be smaller than /*, because might not go 

to zero. This improvement in the objective value is explained by the infeasibility that is allowed by part 
(i). Note also that Proposition Q] reduces to (U Prop. 5.1] with perfect exchanges; i.e., when r = and 
e = 0. 

Now define 

t-i 

S t :=^(3-\ (27) 

i=0 

The result for sequence xp(t) corresponding to Proposition Q] is as follows. 
Proposition 2. Under Assumptions \J}^3\ the sequence {xp(t)} satisfies for t>2 

IbtOLQ IbtCtQ Q 

B 2 c + B 2 d \- flt-i 

(iii) > /* - BJ ||[9(5;«(())] + ll^ (30) 



Similar comments as in Proposition [T] hold for the terms appearing in the right-hand sides of (128T)- 
(|3Qb . The dependence on the iteration index, the maximum errors in the gradients, and now, the forgetting 
factor, is explicit. In these bounds, St tends to infinity as t — > oo, so the terms divided by St tend to 
zero. Note that the quantity (1 — (1 — /3*) is smaller than 1 for all t > 2, and tends to 1 as t — > oo. 

Hence, this quantity may be substituted by 1 to obtain simpler bounds in (|28T)-(|30l). 

Useful insights can be gained by comparing the bounds on the constraint violation in parts (i) of 
Propositions Q] and [2] Observe that the running averages Xp(t) achieve an error level exceeding that of 
x(t) by a quantity proportional to 1 — /3. This difference in the error levels is therefore controlled by the 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (SUBMITTED) 



17 



choice of f3, and can be made small because in practice f3 is chosen close to 1. It is also interesting to 
see that the constraint violation caused by xp{t) decreases faster than that of x(t). The reason is that the 
denominator St in d28T ) is exponential in t [cf. ( [271 )1 as compared to the denominator t in (1241 ). which is 
merely linear. Similar comments can be made for the suboptimality in the corresponding parts (ii) and 
(hi) of the propositions. 

Recall that the results so far have been asserted either by assuming that the iterates C(t) are projected 
on the nonnegative orthant but remain bounded, or, that they are forced to be bounded by projection on a 
set V = 2?baii °f the form (1211) for some fixed g > 0. In the latter case, where a projection on a bounded 
set is chosen, it may be desirable to have a projection on box constraint sets. This is certainly preferable 
in distributed implementations of the power control algorithms. Note that Pbaii hi (EH) is a ball, rather 
than a box. It is possible to consider the following set instead of V, which is a box: 

£>box := {C > 0| HClloo < B* c + g}. (31) 

The following corollary asserts that using V^ ox for projection, it is possible to obtain performance 
identical to that claimed by Propositions \T\ and [2] 

Corollary 1. lfT>^ oyi is used instead ofD^w as the projection set T> in dl5b| ), the results of Propositions^ 
and\2\ still hold. 

Propositions \T\ and [2] along with Corollary Q] suggest that guaranteed performance can be obtained in 
the power control algorithms impaired by errors in the gradient vectors, if the running averages of the 
sequences {y(t)} and {z{t)} are formed. 

Note that Propositions [J and |2] characterize the performance of running averages ( fT6b and ( fT71 ) when 
there are errors in the gradient vectors. It is useful to know that if there are no errors, the performance 
achieved by the running averages of the iterates is the same as the one achieved by the iterates themselves. 
In the context of power control algorithms, this performance when there are no errors in the gradients, 
is given by Lemma [3] This lemma ensures that the iterates will remain arbitrarily close to the set of 
optimal primal solutions for appropriate choice of the stepsize. In particular, consider choosing an e and 
the appropriate stepsize a so that the iterates {x{t)} satisfy dist X*) < e for all t > t', as Lemma [3] 
asserts. The following lemma states that the running averages {x(t)} and {xp(t)} will asymptotically 
have distance at most e from the optimal X* , and is reminiscent of Toeplitz's lemma [25 , p. 341]. 

Lemma 4. Suppose the sequence {x(t)} in H N satisfies dist(x(t), X*) < e for all t > t', where t' is a 
given integer, and X* is convex, closed, and bounded. Then, any limit point (for instance, call one x) of 
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sequence {x(t)} or {xp(t)} satisfies dist(a;, X*) < e. 

Lemma @] suggests that one can use x(t) or xp(t) instead of x(t) and still enjoy e-optimal convergence 
of average power control iterates. 

C. Selecting Convergence Parameters 

Several quantities are introduced in the previous subsections under Assumptions Q3-I3] Their knowledge 
is useful for running ( fT5T ) — for instance, compact sets X and V — as well as for the performance analysis. 
In what follows, certain guidelines for choosing those quantities are provided in the power control context 
of Section JI] The focus here is on the single-channel formulations under Cfl]-Cf3] 

• In order to have a compact set X, it is necessary to know bounds on the optimal powers p* and 
the optimal local IpN variables q* (when applicable). First, it should be stressed that the optimal 
powers are non-zero, due to CQ] Hence, constants pf 1111 > are guaranteed to exist so that 

pf n <P*<P? ax , Vi£M. (32) 

In practice, p™ m can be selected to be a very small positive number. 

• If user i has a minimum SINR constraint yf 1111 , then any pf in satisfying 

<HiP % m in 



n; 



< 7; 03) 



can be used as a bound in (1321 . Indeed, if p* < pf 1111 for some pf 1111 satisfying (1331 . it holds that 

hup* hup? 



< < 7 r m (34) 

m 



which contradicts the optimality of p*. 

Regarding q* [cf. ©], the optimality of the relaxation implies that 

n i + J2 h kiP , r<ql<rii + J2 h ki pf™ VieM. (35) 

An explicit bound on the optimal Lagrange multipliers calls for a lower bound d on the optimal 
dual value of problem © [cf. <(22)]. To this end, define 

li := ■ (36) 

m 

It is clear that for problem © and any feasible p, it holds that % > hapi/(rii + Y^k^hkiPk) for 
all i G M. Similarly, for (0]) and any feasible p and q, it holds that % > haPi/qi for all i 6 M.. 
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Therefore, since the utilities U{(.) are selected strictly increasing and the duality gap is zero, it 
suffices to pick any d satisfying 

M 

d<-Y,<ii)<f{x*)=r- (37) 
i=i 

• A vector x satisfying Slater's constraint qualification is also needed. 

1) For problem (0 with only the constraints d2ct or d2db imposed, sufficiently small powers can 
be selected so that these constraints hold as strict inequalities. 

2) For problems including SINR constraints, the power control algorithm of |[23l can be applied to 
return powers achieving given SINR constraints if the latter are feasible. Hence, the algorithm 
can be applied for (I2al)-(l2"bl to any vector of SINRs 7 with ^f an < ji for all i 6 M.. If those 
SINRs (or any other strictly greater than 7™ m for all i) are achievable with powers p, then 
p is the desired Slater vector. For Applications Q] and |3j the algorithm can be applied to any 
vector of SINRs 7 with ^f an < ji < 7f iax for all i E M. If these are achievable with powers 
p, variables qi can be chosen with this p so that 7, < hapi/qi < 7™ ax for all i G M. 

IV. Numerical Tests 

Distributed power control for the flexible primary model with interference protection (cf. Application |2) 
and imperfect exchanges is tested numerically here. There are M = 10 users; the set of PUs is M. v = 
{1,2,5,6,8}, and the set of SUs is M. s = {3,4,7,9,10}. The locations of Tx-Rx pairs are given in 
Table U With dij denoting the distance between Txj and Rxj , the channels follow the models ha = 
0.5cf~ 2 ' 5 and hij = 0.05d^ 2 ' 5 for i ^ j. The rest of the parameters are listed in Table |II1 Errors are 
considered only in the updates of yi. The errors are random i.i.d. across iterations and across users, 
uniformly distributed with mean -0.005 and support 0.08. This model gives relatively significant errors, 
with value of r (cf. Assumption [3) three orders of magnitude larger than the max-norm of the gradient 
vector at the optimal point. (This vector is easily obtained by running algorithm (fTTT).) 

It was observed that the primal iterates as well as the Lagrange multiplier iterates were indeed bounded, 
so Assumption [2] was satisfied. The interference and SINR constraints as well as the results are presented 
in Table |III1 For the algorithm with errors in the gradients, the running averages yp{t) are used for 
the evaluation of the achieved SINRs and interferences in Table [TlTJ It is seen that the constraints are 
accurately met. 

Fig. CD shows the constraint violation for the two types of averaging ( fl~6l ) and (fTTT ). Recall that the 
function g(x) is the vector formed by stacking the constraints d4bl ) and d4cb for all i G A4. It is observed 
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TABLE I 

Coordinates of 10 Tx-Rx pairs in meters (shown in 2 columns). Tx are deployed over a square area of 
side 30m. Each Rx is located in a square area of side 5m centered at its peer Tx, and at least 1m away 

from its peer tx. positions were randomly selected. 



Tx*; Rx« (1 = 1,2,3,4,5) 


Tx^Rx, (i = 6,7,8,9, 10) 


(21.181, 21.061); (19.755, 20.88) 
(20.508, 0.53543); (21.837, -1.2724) 
( 4.6367, 16.308); (3.6544, 14.561) 
( 5.3579, 17.061); (5.2071, 18.869) 
(12.942, 10.246); (10.602, 8.0944) 


(0.91656, 3.5594); (2.7782, 2.8003) 
(0.61918, 1.8744); (-1.6297, 1.3382) 
(9.152, 0.67406); (11.408, 0.99965) 
(1.7077, 24.352); (0.10774, 21.941) 
(28.384, 22.046); (28.627, 23.873) 



TABLE II 

Simulation parameters for algorithm with imperfect exchanges. 



M = 10, M p = {1, 2, 5, 6, 8}, M s = {3, 4, 7, 9, 10} 
Wi(7») = Wi In 7*; Wi = 1, i e M v \ Wi = 0.5, i £ M s 
p™ ax = 6 dBm, i£M p \ pr ax = dBm, i £ Ms 
m = -41 dBm, for all i e M 

Initialization: pf lax = 1% of pf 1 ™, z z = Inn* for alH € M 

Hi = 1, for all i £ M; A< = 0, Vi = for alH G M s 
a = 0.06, /3 — 0.9, T> — nonnegative orthant 



that the constraint violation for the averaging (fTTl) decreases faster than that for (fl6l) . 

V. Conclusions 

This paper presented in a unified way several convex power control formulations, ranging from typical 
to contemporary ones pertaining to spectrum sharing in unlicensed bands and bands with primary and 
secondary users. The common feature is that the power control must account for various constraints 
which couple the power variables, such as maximum interference constraints, maximum receive -power 
constraints, or minimum and maximum SINR constraints. The gradient method for finding saddle points 
of the associated Lagrangian function was employed. Distributed implementations of the method were 
also presented in a unified way, by identifying the non-local parts of the gradient vectors for each link. 
In order to acquire these, exchange of information among nodes takes place, which may entail errors in 
the gradient vectors. 

The impact of errors in the gradient method for saddle points was studied in a general setting. To 
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TABLE III 

Power control with imperfect exchanges: Sum-utility (top); constraint values and achieved SINR and 

INTERFERENCE PER USER WITH AND WITHOUT ERRORS (BOTTOM). THE FIRST GROUP OF USERS CONSISTS OF PUS, WITH 
MAXIMUM INTERFERENCE CONSTRAINTS, AND THE SECOND GROUP OF SUS, WITH MINIMUM (LEFT) AND MAXIMUM 

(RIGHT) SINR CONSTRAINTS. 







Constraints 


Without errors 


With errors 




Sum-utility 




26.635 


26.627 


1 






-30 


-38.402 


-38.392 


2 






-30 


-39.684 


-39.679 


5 


Interference (dBm) 




-38 


-38.132 


-38.129 


6 






-38 


-37.997 


-37.995 


8 






-38 


-38.002 


-37.995 


3 







10 


10.000 


10.040 


4 







10 


10.000 


9.958 


7 


SINR (dB) 





20 


-0.001 


0.012 


9 







10 


10.00 


9.975 


10 







20 


20.00 


19.989 



this end, two running averages (ergodic sequences) were formed from the iterates generated by the 
method. One weighs all iterates the same, and the other uses exponentially decreasing weights for past 
iterates. Under the assumption of bounded — but otherwise arbitrary — errors, the two averaging schemes 
were shown to have complementary strengths in terms of constraint violation and suboptimality reduction. 
Future research could focus on characterizing the impact of random errors arising from probability density 
functions with infinite support, using stochastic approximation techniques. 
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Appendix A 
Proofs of Lemmas [Hand [2] 

Proof of Lemma |7} First, Lemma Q] is proved in two steps. Define ^ max — [7™^, . . . , 7]J} ax ] T . 

Step 1: The following claim is proved, which applies to the case where all users have maximum SINR 
constraints in ([3]). It relates the (non-)achievability of 7 max in problem ® (cf. CO) and the relaxed (0]). 

Claim: If there is no p in the feasible set of © such that ji = 7 4 max for alH € M., then there are no 
p, q in the feasible set of © such that hapi/qi = 7 i max for all i G M. 
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With i. max := [qf^/h u , . . . , qf? x /h M M] T , feasibility of the SINRs 7, max in © is written as 

p = D( 7 max )Ap + D( 7 max )r7, 0<p<p max , Ap + r]<L max . (38) 

If the spectral radius of D(7 max )A satisfies / o(D(7 max ) A) < 1, then the first linear system in (T38T ) has 
a unique positive solution p(~y max ) := (I — D(7 max )A) _1 D(7 max )?7 [|9j Th. A.51]. Since it is assumed 
that (|38l ) does not have a solution, one of the following two mutually exclusive cases can happen: (i) 
p(D(7 max )A) > 1; or (ii) p(D(7 max )A) < 1 but with p( 7 max ) ^ p max or Ap(7 max ) + rj ^ t max . (The 
notation ^ means that at least one entry of the vectors does not satisfy the inequality <.) The cases of 
p(D(7 max )A) > 1 or of /)(D(7 max )A) < 1 with p( 7 max ) ^ p max are dealt with in @ Lemma 3]. 

To address the remaining case, note that achievability of 7 max in ((JJ) implies that D _1 (7 max )p < i, max . 
In order for such p to exist, it is necessary that D _1 (7 max )p(7 max ) < i max . But using p(D(7 max )A) < 1 
and Ap(7 max ) + t] ^ t max , it follows from the first linear system in (f38]> that D _1 (7 max )p(7 max ) = 
Ap(7 max ) + rj ^ t max (contradiction). 

Step 2: Now, (f5]) is proved by contradiction. The previous claim is used, and the proof follows the 
proof of El Prop. 1], noting that the additional constraint qi < qf 1 ^ can be easily taken into account. 

Finally, the multi-channel case does not need a claim similar to the one in Step 1, but follows the 
proof of [21 Prop. 3], where again the additional constraint qi G Qj can be easily taken into account. □ 
Proof of Lemma |2} The proof is for the single-channel case first, and proceeds in two steps. 
Step 1: The aim is to show that /jl* > 0, i = 1, . . . , M, where /j,* are the optimal Lagrange multipliers 
corresponding to constraint (l4cl l. To this end, note that the additional convex constraint set y x Z has 
special structure, described by the following property: If (y, z) € y x Z, then any (y, z) with y < y 
and z < z also satisfies (y, z) G y x Z. This property, together with Slater's constraint qualification, 
can be used to write the necessary optimality conditions [24 , Sec. 5.2 and 5.4] 
dL{u 



dL(u) 

< 0, 



< Vi G M. (39) 

(y* ,z*,W,\* ./x*) 



dy, 

Writing out the partial derivatives explicitly and summing them up, we arrive at a linear system of 
equations that the vector /x* := . . . , /j,* M ] satisfies which can be shown to have a positive solution, 
in the spirit of the proof of [4, Lemma 1(a)]. 

Step 2: The positivity of the optimal Lagrange multipliers for the local IpN constraints can be 
leveraged to show that the Hessian of the Lagrangian function with respect to (y, z) is stritly convex (4J 
Lemma 1(b)]. This implies stability of the saddle points with respect to (y,z). 

Finally, the multi-channel case uses analogous arguments, noting that the additional convex constraint 
set under both types of interference requirements has the property previously described. □ 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (SUBMITTED) 25 

Appendix B 
Proofs of Propositions [I] and [2] 

Assumptions [l]-[3] — referred to as "boundedness assumptions" hereafter — are supposed to hold through- 
out. Let x* and C* denote an optimal primal solution and an optimal Lagrange multiplier vector for (O, 
respectively. For the analysis, the stability of the saddle points [cf. ([121) 1 will not be used, but instead 
methods from (H will be adapted in order to (a) include errors in the updates, and (b) accomodate 
the time-varying weights /3~\ A lemma about the successive iterates x(t), and x(t + 1), £(t + 1) 
generated by ( fl5l ) follows, which is useful in proving results for both types of averaging. 

Lemma 5. Under Assumptions \J\\3\ the sequences {x(t)}, {£(£)} satisfy for all x € X, C £ T>, and 

t > 

(i) ||CC* + 1) - Cll 2 < ||C(i) - Cll 2 + 2aVj'L(a J (t) ) C(*))(C(*) " + 2ae\\C(t) - C|| + o?{B L + e) 2 (40) 

(ii) \\x(t+l)-x\\ 2 < \\x(t)-xf-2a(L(x(t),C(t))-L(x,C(t))) + 2ar\\x(t)-x\\+a 2 (B L +r) 2 . (41) 

Proof of Lemma^ (i) Using the nonexpansive property of the projection [24, Prop. 2.2.1], it follows 
from d!5b| ) that for all x G X, 

K(t + 1) - Cll 2 < Kit) + a(V c L(x(t),C(t)) + 6(t)) - Cll 2 
= ||C(i) - Cll 2 + 2aV%L(x(t), C(t))(C(t) " + 2ae T (t)(C(t) - C) + a 2 || V c L(iE(i), CW) + e(i)|| 2 . 

Invoking Cauchy-Schwaitz inequality and the boundedness assumptions, d40l follows readily. 

(ii) Again, using the nonexpansive property of the projection, it follows from dl5ab that for all C € D , 

||aj(< + 1) - || 2 < ||a;(i) + a(S7 x L(x(t),C(t)) + r(t)) - x\\ 2 
= \\x(t) - x\\ 2 + 2aV^L(x(t),C(t))(x(t) -x) + 2ar T (t)(x(t) -x) + a 2 \\V x L{x(t)X{t)) + r(t)\\ 2 . 

Invoking Cauchy-Schwartz inequality and the boundedness assumptions, it is deduced that 

||x(t + l) -a;|| 2 < \\x(t) - x\\ 2 + 2aV^L(x(t),C(t))(x(t) - x) + 2ar\\x(t) - x\\ + a 2 \\B L + r \\ 2 . 
By convexity of L(x, C) with respect to x, it follows that 

V^L(x(t)X(t))(x(t) -x) + L{x(t)X{t)) < L(x, C(t)). 
Combining the last two inequalities yields (|4TT ). □ 
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Now, the focus turns to averaging (fT6l ). To facilitate proving the results, we will also consider the 

running average £(i) of Lagrange multipliers, together with x(t): 

t-i 

C(*):= 7 £C(0- (42) 



i=0 



t-1 

f 

The following lemma characterizes the running average of the Lagrangian function values at x(t) and 
CW, and will be used in the proof of Proposition Q] 



Lemma 6. Under Assumptions \7\\3\ it holds for all t > 1 that 

- £ cw) - /• s + r* + 

i=0 a 
Proof of Lemma ® It follows from (flU that for all x € Af , i > 



Averaging the latter, it follows that 

t-i . t-i i , . i| 2 , /- 1 



- £ L(x(t), CW) - - £ L(aj, CW) < " KJ " + - £ r||x(«) - x|| + V ^ ^ 

i=0 i=0 i=0 

Furthermore, the concavity of L(x,Q with respect to C implies that 

t-i 

- £ cw) > ^(x*, cw) > L(x*,a = r • 



(43) 



r i ,s . / . . w llxW — 1 1 2 — \\x(i + 1) — x\\ 2 ... , a(5r + r) 2 

L(x(i), CW) - L CW < 1 ^ + r x i - x + >-. (44) 

la 2, 



(45) 



(46) 



i=0 

Using the latter, a; = a;*, and the boundedness assumptions into (1431) . we obtain (l43l) . □ 
Proo/ o/ Proposition [7} (i) It follows from (|40]) for all C G T> and i > that 

vf l<«», cwkc - cw) < ii«o-cii'-ii«i + i)-cir + £ || C(i) _ a + 2<^. (47) 

Using the convexity of L(x, C) with respect to C> we obtain 

(CW -CTv c £(zW,CW) < C(0) -i(x(»),c*) < L(xW,CW) -/*• (48) 

Upon combining (|47T ) and (l48l . one arrives at 

(C - C*) r V c L(x(i),C(i)) < (C - C(i)) T V c i(x(i),C(!)) + (C(0 - C*) T V c I(!c(i),C(i)) (49) 

||C(0-CII'-IIC(< + i)-tf + e||c(() _ „ + oCB^+eF + _ 



Averaging the latter and invoking (1431 ). we obtain for all C £ X> 

i Be - <rv c L(,( i) ,c«) < K2LzC!! + ig £ || C(i) _ C || + ^SL±jf 

+ lWt^ll° Bd + <Ei+iL. (51) 

zai z 
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Now, consider the following vector: 

C := C + Q 







+ 






+ 



(52) 



Recall that even if V is not bounded, the iterates are still assumed to be bounded (cf. Assumption 0, 
and in particular, to lie in the set £>baii given by (12TI ). It is easy to verify using the triangle inequality 
that £ S Pbaii- Noting that V^L(x(i), CW) = 9{ x (i))> a straightforward manipulation yields 



t-i 



Etf-O T v c £(*(i),C(0) 



8=0 



t-1 



E^w) 



i=0 



(53) 



Since </(a;) is convex and [.] + is a nonnegative vector, the following implications hold: 



g(x(t))<^g(x( i ))^[g(x(t))} + <± 



i=0 



t-1 



8=0 



t-1 



8=0 



(54) 



Consider next combining (1511 ). (1531 . and (1541 ). and taking the sup over X>baii; then 



t-i 



\[g(x(t))] + \\< sup I^(C-C*) T V C ^«,C«) 
C^.» St f^o 



< sup 

C6»ball 



IIC(Q) - C 

2a gt 



1 2 * t-1 
+ 

t 



+ 7Z^ e sup IKW — Cll H — 1 — 1 \-rB d + 



2g 



2agt 



2g 



i=o 

which readily gives (l24l . 

(ii) By using the convexity of f(x), adding and subtracting the terms £ T (i)g(x(i)) to f(xi) below, 
and invoking the definition of the Lagrangian function ( fTOb and d43t , we obtain for all t > 1 

t-i , t-i 

(55) 



t-i t-i 

f(x(t)) - r < - E ^(0, cw) - r - - E c T (o» 



8=0 



8=0 



t-1 



,1 cc(0) - a;*|| 2 „ a(B L +r) 2 1 v—. „ r 



2crf 



f 



(56) 



«=o 



The concavity of the Lagrangian function with respect to £ implies that 

VfL(x(»), c(0)(C - CW) + CW) > 0- (57) 

Using the latter in (00]) with ( = £ P, and noting that L(x(i), - L(x(i),0) = C T (i)g(x(i)), it 
follows that for all i > 0, 



HCWII 2 < IIC(0)|| 2 + 2aC T W5(^W) + 2ae||C(t)|| + a 2 (^ + e) 



(58) 



2S 
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Summing the latter for i = 0, 1, . . . , t — 1, dividing by 2at, and using the boundedness assumptions, we 
obtain 

ig 0i)9Wi))£ M +£Bc+ f^. 

i=0 



Using ([59]) into 456]) yields 
(iii) It holds for all t > 1 that 



/(*(*)) = L(x(t), C) - C T 9(x(t)) >f*~ C T 9(x(t)) (60) 
and also 

- C T g(m) > -C T [g(m)] + > -IICII \\[g(x(t))] + \\. (61) 

Using d6B into d60]), 426]) follows. □ 
Now, focus is turned to the running average in (fT71) . It will be helpful to consider also the running 
average of the Lagrange multipliers, 

c " () - eSt^ " e5^ ' • '- ( ' 

The following lemma is a result about summations that appeal - frequently in the analysis of averaging ( fTTT ), 
and will be used in subsequent proofs. 



Lemma 7. With S t as in (1271) , it holds for all x G X, £ G V, and t > 2 f/iaf 

t—1 -, ll /^\ 119 -, t-1 



^g^,.( -. P -| Wi+1 )-.||») < M^^l^-^) ,.(«)_.,. (63) 
(«) j E i(n«') - Cll 2 - »<(*+ 1) - Cll 2 ) < llc(0 > " <f + J- E (4 - i) »«*) - <h 2 « 4 > 



t-1 



<n>5E(*-^) = <i-ffl^. <*> 



8=1 



?t-l 



Proof of Lemma [7] (i) By simply rearranging terms in the summation, it holds for all i > 2 that 

^Ei(ii a; «- a; ii 2 -ii^ + 1 )- a; ii 2 ) 

||a?(0) — x\\ 2 1 1 „ . , ll9 1 4-4 1 „ , , „, 1 . . „ 9 

= s, + s, E F IM>> - -II 2 " 5 E jM> + 1) - -II 2 - ^ll-M - -II 2 <W> 

i=r 1 i=o ^ ^ 



from which (1631 ) follows easily, 
(ii) This is identical to (i). 
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(iii) The sum is telescopic. It follows for all t > 2 that 



1=1 



W^~ l 1-/3 



t-i 



1-/3 



t-i 



i-p t 

1-/3 



(67) 



which is exactly d65l ). □ 
The convergence analysis is patterned after the previous one for averaging (MoT ), generalizing it to 
consider the varying weights /3\ In the ensuing proofs, just the main steps for this generalization are 
summarized, starting with the following counterpart of Lemma [6] 



Lemma 8. For all t>2, it holds that 

jr E jM x (^ coo) - r < 

1 8=0 P 



2aS t 



2a 



i-g'-' +rBj+ ^+^ 



1-/3* 

Proof of Lemma [8} Multiplying (|44l) with /3~\ summing for « = 0, 1, . . . , t — 1, and dividing by 
St, it follows that 

lE^w,c(0)-7E^c(<)) 

* i=0 M 



& — a; 



/• ,n n2\ 1 v ^ 1 II / . \ , a(B L + r) 2 
x(* + 1) - scf) + — -^r\\x(i) - x\\ + '-. 

bt i=o ^ 



(69) 



Using the concavity of L(cc, £) with respect to C, as in the proof of Lemma [6j in combination with 
Lemma [7] and the boundedness assumptions, d68l) follows readily. □ 

Next, Proposition |2] is proved using Lemma [8] 

Proof of Proposition^ (i) Multiplying (1501) by /3 - \ summing for % = 0, 1, . . . , t — 1, dividing by 
<S|, and invoking Lemmas [7] and [HI it follows that for all £ € V, 

I g £k^W-M. CM) £ ^^+^ 8 (? " ^r) »«'>-< 



8=0 



2 a(£ L + e) 2 



+ ^g>W-CII + 

Consider the following vector: 



|:c(0) - x*\\ 2 B 2 d l-^' 1 d a(B L + r) 2 



2aS t 



C--=C + q 



+ 2^-^1-/3* 



(70) 



1 E*=o ig(x(i)) 



(71) 



5 t Z^j=0 

The rest of the proof continues as the proof of Proposition Q] In the present case, the boundedness 
assumptions and Lemma [7] are invoked after taking the sup of the right-hand side of dTOl over £ € £>baii- 
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(ii), (iii) Following the proof of part (ii) of Proposition [TJ a bound on the sum ^ Y^Zo /^~ l C T (*)ff ( x (i)) 
can be obtained by multiplying (1581 ) with summing for i = 0, 1, . . . , t — 1, dividing by 2aSt, and 
invoking the boundedness assumptions and Lemma UJ Part (iii) is identical to that of Proposition [TJ □ 

Finally, Corollary [TJ and Lemma [4] are proved next. 

Proof of Corollary [7J If the set T>^ ox is used for the projection, then (1511 ) holds for all £ € Dbox- 
But Dbaii C I^box and £ G Dbaii [cf- (1521)1. so it is still possible to take the sup over £ G £>baii hi (1551 ). 
The argument also holds for the right-hand side of (TTOb . □ 
Proof of Lemma^ Assumption [2] implies that the sequences {x(t)} and {cc^i)} indeed have limit 
points. Focusing first on averaging (fT6l ). assume the claim is not true, i.e., suppose that dist(a:, X *) > e. 

There will be a subsequence {S(tfc)}, A; G IN so that x{t^) — >■ a; as A: — >• oo. Due to the continuity of 
the distance function, dist(», X*) > e implies that there is an integer k! and an e' > so that 

dist(sB(t fc ), X) >e + e' Vk> k'. (72) 

Fix an integer I so that t\ > t', I > k! . Now, since X* is closed and convex (cf. Assumption [[]), the 
distance function is convex ll24l Prop. 2.2.1]. It then follows from (1721 that for any k > I, 



-i tk — l tl~l i tfc— 1 

e + e' < - ^ dist(a;(i), #*) = - ^ dist(x(i), AT*) + - dist(a:(i), AT*) 

«=0 J = «=t; 



(73) 



Using that dist(x(i), X*) < e for all i > t' into the latter, it follows that 

U~ i ^ tfe—l ^ t/— l ^ 

e + e' < — dist(a:(i), X*) + — ^ e = — ^ dist(sc(s), *j£ + £■ (74) 

tfc i=o tfc i=t, ** i=o tk 

Taking k — > oo leads to a contradiction, e + e' < e. For the averaging ( fTTT ), the proof is analogous. □ 



